Model Selection

Multi-speaker Support

# Multi-speaker Support

CSM (Conversational Speech Model) is a 1B-parameter speech generation model developed by Sesame, capable of generating RVQ audio encoding from text and audio inputs.

Speech Synthesis English

Csm 1b Safetensors Fp16

CSM (Conversational Speech Model) is a 1-billion-parameter speech generation model developed by Sesame, capable of generating RVQ audio encoding from text and audio inputs.

Speech Synthesis

Transformers English

CSM is a 1B-parameter speech generation model developed by Sesame, capable of generating RVQ audio codes from text and audio inputs, supporting context-aware speech generation.

Speech Synthesis English

Csm 1b Safetensors Quants

CSM (Conversational Speech Model) is a 1-billion-parameter speech generation model developed by Sesame, capable of generating RVQ audio encoding from text and audio inputs.

Speech Synthesis

Transformers English

A PyTorch-based text-to-speech model supporting Chinese speech synthesis, developed and released by SesameAILabs.

Speech Synthesis

Yourtts Formosan Only Ithuan

Experimental speech synthesis model based on Amis and Taroko languages, trained using the ithuan dataset

Speech Synthesis Other

Brazilian Portuguese text-to-speech model based on F5-TTS, supporting emotion tags and speaker feature control

Speech Synthesis Other

Parler Tts Mini V1.1

Parler-TTS Mini v1.1 is a lightweight text-to-speech model trained on 45,000 hours of audio data, capable of generating high-quality, natural-sounding speech with controllable features through simple text prompts.

Speech Synthesis

Transformers English

Speecht5 Tts Tr V1.0

A Turkish text-to-speech model fine-tuned from Microsoft SpeechT5, capable of generating natural speech

Speech Synthesis

Transformers Other

Parler Tts Tiny V1

Lightweight text-to-speech model trained on 45,000 hours of audio data, capable of controlling voice attributes through text prompts

Speech Synthesis

Transformers English

Parler Tts Mini Expresso

Parler-TTS Mini: Expresso is a lightweight text-to-speech model fine-tuned on the Expresso dataset based on Parler-TTS Mini v0.1, supporting emotion and speaker control.

Speech Synthesis

Transformers English

Speecht5 Finetuned Facebook Voxpopuli French

A text-to-speech model fine-tuned on the VoxPopuli French dataset based on microsoft/speecht5_tts

Speech Synthesis

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase